Computer  Science  Department 


TECHNICAL  REPORT 


A  SEMANTIC  APPROACH  TO  CORRECTNESS 
OF  CONCURRENT  TRANSACTION  EXECUTIONS 


By 


0 
U 


c 

01 

u 

u 

3 

u 

c 

0 


Paul  Spirakis  & 
Alexander  Tuzhilin 


Tech. 
July 


Rep. 
1984 


#130 


o  o 

CO 

^-1  ^ 
I  3 

t-  a 

CO  * 
U  0) 

•^  -I 

>-  03 

a 


(0  u 
0 

a  0 
a 

(0  (fl 

(1) 

u  4) 


c 
u 


NEW  YORK  UNIVERSITY 


(0     (U 


e 

8) 


Department  of  Computer  Science 
Courant  Institute  of  Mathematical  Sciences 

251  MERCER  STREET,  NEW  YORK,  N.Y.  10012 


m'  m 


CmZm  m'  m- 

t  «   #  .r 

■»  Jt__^«i.  ■•  i»   w  •  'm   m   m     ■ 


m  m  4 

'.1 


A  SEMANTIC  APPROACH  TO  CORRECTNESS 
OF  CONCURRENT  TRANSACTION  EXECUTIONS 


By 


Paul    Spirakis    & 
Alexander   Tuzhilin 

Tech.    Rep.     #130 
July    1984 


'This   work   is    supported    in   part   by    the   NSF   grant   MCS   83-00630. 


-2- 
Abstract 

One  of  the  main  Issues  in  concurrency  control  is  the  question  of 
what  constitutes  a  legal  or  correct  behavior  of  a  group  of  transactions 
updating  the  database  simultaneously.  It  seems  that  the  undesirable 
effects  of  concurrent  transaction  executions  can  be  put  into  three 
classes:  violation  of  integrity  constraints,  inconsistent  outputs  to 
users  and  racing.  An  intuitive  way  to  define  correctness  of 
transaction  schedules  is  then  to  require  that  the  scheduler  avoid  all 
three  types  of  anomalies.  In  this  paper,  we  formalize  this  notion  of 
correctness.  To  do  this,  we  develop  a  new,  desirable,  semantic 
property  of  transaction  schedules,  which  we  call  independence.  Then, 
we  give  a  partial  answer  to  the  following  question:  Is  there  any 
intermediate  class  of  schedules,  between  the  classes  of  serializable 
and  correct  schedules,  that  has  an  easy  membership  test?  We  first  prove 
a  negative  result.  For  integrity  constraints  in  the  form  of  linear 
inequalities  and  for  linear  semantics  of  transaction  actions,  we  show 
that  the  serializable  schedules  are  the  only  class  of  schedules 
preserving  those  integrity  constraints.  However,  if  the  semantics  of 
transaction  actions  are  more  restricted,  then  there  exists  a  class  of 
nonserializable  schedules  (we  call  them  weakly  serializable  of  order  2) 
which  is  a  proper  subset  of  the  class  of  correct  schedules  and  has  an 
easy  membership  test. 
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1.  Introduction 
One  of  the  main  issues  in  concurrency  control  is  the  question  of 
correctness  i.e.  what  constitutes  a  "legal"  or  "correct"  behavior  of  a 
group  of  transactions  updating  the  database  simultaneously.  If 
transactions  are  allowed  to  access  the  database  in  a  nonsynchronized, 
uncontrolled  fashion,  then  they  may  interfere  with  each  other  and 
incorrect  final  results  may  be  produced  due  to  transaction 
Interference.  These  undesirable  effects  can  be  classified  in  the 
following  three  groups:        ••-.'/ 

(1)  violations  of  database  ii]L,t,eg-rity  constraints 

(2)  inconsistent  outputs  to  the  users 

(3)  racing  -i 

These  three  types  of  update  anomalies  are  amply  discussed  in  the 
literature  and  the  reader  is  referred  to  [Bern  80],  [Bern  81],  [Silb 
80],  [Ull  82],  [Papa  79]. 

A  natural  and  intuitive  way  to  define  correctness  is  by  requiring 
elimination  of  all  three  types  of  anomalies.  This  approach  was  taken 
in  [Gars  83]  and  is  also  taken  in  this  paper.  However,  the  notion  of 
serializability  was  assumed  to  be  a  formal  counterpart  of  correctness 
by  many  researchers  ([Bern  79],  [Esw  76],  [Papa  79],  [Schl  78],  [Ull 
82]).  It  was  pointed  out  by  several  researchers  ( [Casa  30],  [Gars  83], 
[Fisc  82],  [Kung  79],  [Schl  78])  that  serializability  is  a  too 
restricted  notion  and  that  meaningful  results  can  be  obtained  for 
designing  schedules  which  are  correct  but  not  necessarily  serializable. 

However,  little  work  has  been  done  for  the  problem  of 
characterization  of  the  class  of  "correct"  schedules,  because  of  (at 
least)  the  following  two  reasons: 
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(1)  Semantic  information  about  transaction  actions  and  integrity 
constraints  is  required.  As  Kung  and  Papadimitriou  show  in  [Kung  79], 
serializability  is  the  best  we  can  do  if  only  syntactic  information  is 
available.  More  specifically,  it  was  proven  that  for  any 
nonserializable  schedule  there  exists  another  schedule  with  Che  same 
syntax  (but  different  semantics)  which  violates  the  Integrity 
consraints.  '•:'•■ 

(2)  It  is  difficult  to  recognize  correct  schedules. 
Serializability  is  a  syntactic  notion.  Algorithms  which  decide  If  a 
schedule  is  seriallzable  have  been  constructed  in  the  past,  based  on 
the  idea  of  dependency  graphs  a:nd  polygraphs  (see  [Papa  77], 
[Papa  78]).  It  is  not  clear  how  to  test  a  given  schedule  to  be 
semantlcally  consistent.  [Gars  83]  proposes  an  interesting  solution  to 
the  concurrency  control  problem,  by  -  -introducing  a  scheduler  that 
produces  semantlcally  consistent  schedules  by  utilizing  the  notion  of 
transaction  types  (used  also  in  the  SDD-1  System,  see  [Bern  80]). 
However,  his  solution  Imposes  certain  decisions  on  the  users,  and  the 
problem  of  constructing  efficient  tests  for  certain  semantics  is  still 
pretty  much  open. 

In  this  paper,  we  formalize  the  notion  of  correctness.  A  schedule 
is  correct  if  (1)  it  preserves  consistency  of  the  database,  (2)  each 
transaction  acts  as  if  it  were  isolated  from  the  other  transactions  (we 
call  this  property  independence)  and  (3)  racing  does  not  occur  in  the 
schedule.  All  these  notions  will  be  defined  later.  Then  we  give  a 
partial  answer  to  the  following  question:  Is  there  any  intermediate 
class  between  the  classes  of  seriallzable  and  correct  schedules  that 
has   an  easy  membership   test.   We  first  prove  a  negative  result:  For 
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integrity  constraints  in  the  form  of  linear  inequalities  and  for 
semantics  of  transaction  actions  being  linear  combinations  of  previous 
values  of  data,  we  show  that  the  only  class  of  schedules  preserving 
these  integrity  constraints  is  the  class  of  serializable  schedules. 
However,  if  the  semantics  of  transaction  actions  are  more  restricted, 
then  there  exists  a  class  of  (nonserializable)  schedules  (we  call  them 
weakly  serializable  of  order  2)  which  is  a  proper  subset  of  the  class 
of  correct  schedules.  This  class  has  a  lot  of  other  interesting 
properties,  which  are  investigated  in  this  paper. 

2.  Basic  database  notions,  update  anomalies  and  correctness. 

2.1  Basic  notions 
A  formal  treatment  of  the  basic  notions  can  be  found  in  [Kung  79] 
and  an  extended  future  version  of  this  paper.  We  present  a  brief 
informal  overview  of  this  material  here.  A  database  consists  of  a  set 
of  objects,  the  sets  of  values  these  objects  can  take  and  the  set  of 
combinations  of  values  that  are  allowed  (integrity  constraints,  ICs). 
Transactions  are  modeled  as  sequences  of  operations.  Each  transaction 
has  its  own  local  variables,  it  reads  inputs  from  the  database,  and 
writes  outputs  back,  to  the  database  and  users  (Fig.  1). 
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Figure  1 
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Each  operation  is  defined  as  a  mapping 
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where  Eg  ,  Z-^^  ,  Egg  are  the  states  of  the  database,  the  states  of  the 
local  workspace  and  the  states  of  the  external  world.  A  sequence  of 
operations  is  their  composition.  A  transaction  is  a  sequence  of 
operations  that  maps  any  consistent  state  of  the  database  to  a 
consistent  state.  A  transaction  system  is  a  collection  of 
transactions.  A  schedule  of  a  transaction  system  is  a  sequence  of  all 
operations  of  transactions  in  that  system,  executed  in  some  interleaved 
fashion  (which,  however,  preserves  the  relative  order  of  operations 
within  each  transaction).  Executions  of  transaction  systems  can  be 
defined  in  terms  of  states  and  program  counters,  in  a  way  similar  to 
"computations"  in  concurrent  programming. 

A  schedule   preserves   integrity  constraints   if  for  any  initial 
consistent  state  of  the  database,  the  final  state  of   the  database   is 
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conslstent.    Let  P  be   the  class  of  Integrity  constraints  preserving 
schedules.   (The  correct  schedules  of  [Kung  79]). 

2.2   Update  anomalies 
A  fundamental  assumption  of  this  paper   is   that   there   are   only 
three  types  of  inconsistencies  that  can  occur  during  concurrent  updates 
of  the  database: 

(1)  Transactions  leave  the  database  in  an   inconsistent   state 
(violation  of  ICs) 

(2)  Some  data  are  lost  during  concurrent  updates  (racing). 

(3)  Some   transactions   can  produce  inconsistent  outputs  to  the 
users. 

These   inconsistencies   are   discussed   at   length   in   the   literature 
([Bern  80],  [Bern  81],  [Silb  80],  [Ull  82],  [Gars  83]). 

The  authors  are  not  awaree  of  any  other  anomalies.  Therefore,  the 
class  of  correct  schedules  to  be  defined  later  must  eliminate  all  three 
types  of  inconsistencies  discussed  above.  We  now  discuss  each  type  of 
inconsistency  separately. 

2.2.1   IC  violations  (Type  1  inconsistencies) 

Definition.   A  schedule  violates  IC  if  there  is  an  initial  state  of  the 
database  for  which  the  final  state  of  the  database  does  not  satisfy  IC. 
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Example 
Tl:   rA    A  :=  A+1       A  :=  A-1       wA 
T2:     rA         A  :=  2A         wA    : 

IC  =  "A  =  0" 

If  the  initial  value  of  A  was  Aq  =  0  then  the  final  value  is  2Aq  +1   = 

1  j'  0.   '"  ■  ■■  '   ■ 

As  we  mentioned  before,  the  class  of  schedules  which  preserve  the 
integrity  constraints,  obviously,  do  not  have  inconsistencies  of 
type  1 . 

2.2.2   Racing  (Type  2  inconsistencies) 

A  Racing  Condition  is  a  situation  where  some  data  are  lost  as  a 
result  of  concurrent  execution  of  the  transactions  of  a  transaction 
system. 

Example: 

T^:     rA  ;    A  :=  f^(A)  ;    wA 

T2:     rA  ;  A  :=  f2(A)  ;    wA 

The  value  written  by  T,  to  A  is  lost. 

As  defined  above,  racing  is  an  informal  notion. 
We  adopt  here  the  following  formal  counterpart: 

Definition:   A  schedule   S  has   a  racing  condition  if  there  exist  two 
transactions  T,  and  Ty    ^^   S  and  an  item  A  in  the  database,   such   that 
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the  sequence  of  operations  on  A  by  Tj^  and  T2  ,  as  it  appears  In   S,   Is 
not  syntactically  the  same  as  to  the  sequence  of  the  operations  on  A  in 
any  of  the  serial  schedules  T,T2  or  T2T^  . 

Note  that  racing  is  a  syntactic  notion.  Also,  notice  how  this 
definition  applies  to  the  example  of  racing  given  above. 

Definition.    Let  NR  be   the   class   of  schedules  which  do  not  have  a 
racing  condition. 

2.2.3   Inconsistent  Outputs 

Consider  the  following  example  from  [Bern  81]: 

Example  .■  -  ■  . 

Assume  transaction  T,  transfers  funds  from  a  savings  to  a  checking 
account  and  transaction  T^  reads  savings  and  checking  balances  and 
prints  the  total  balance.  Let  the  ICs  be  Check  +  Sav  =  C  (C  is  a 
constant)  and  let  the  execution  of  T,  and  To  result  in  the  following 
schedule: 


time  + 


1      ^I 

^2      1 

1    r  Sav 

1 Sav  :=  Sav- 

-A 

1     w  Sav 

r  Sav     1 

r  Check   1 

print(Sav+   | 

Check)  1 

1   r  Check 

1  Check :  = 

1    Check+A 

1   w  Check 

The  IC  is  not  violated  in  the  above  schedule.  However,  T2  prints 
inconsistent  results  (i.e.  C-A ) .  This  occurs  because  a  transaction  is 
allowed  to  access  a  temporarily  inconsistent  database  and  report  to  the 
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user.   This  example  stimulates  the  following  definition  of  consistent 
outputs  by  Individual  transactions. 

Definition.  A  Transaction  T  in  the  schedule  S  produces  consistent 
outputs  to  the  user  iff  for  any  initial  consistent  database  state  for  S 
and  for  any  external  inputs  for  all  the  transactions  of  S,  there  is  a 
consistent  state  E  of  the  database  such  that: 

If  T  runs  alone,  faces  this  initial  state  E  and  reads  the  same 
inputs  as  T  in  S,  then  T  in  S  and  T  running  alone  produce  the  same 
outputs. 

Note  that  T2  ,  in  our  example,  violates  the  above  definition, 
because  T2  does  not  face  a  consistent  state  of  the  database  when  it 
starts.  Also,  the  above  definition  is  expressed  in  terms  of 
transaction  external  "writes"  (outputs  to  the  user).  It  is  more 
practical  to  give  a  somewhat  similar  definition  in  terms  of  transaction 
"reads".   Such  a  concept  is  defined  in  the  next  section. 

2.3   Independence 

Intuitively,  a  transaction  T  is  independent   In  a   schedule   when 

actions   of   other   transactions  are  transparent  to  it.   To  define  this 

notion  formally,  we  have  to  compare  the  execution  of  T  in  a  schedule   S 

with  its  execution  when  T  runs  alone.   We  adopt  the  following  notation: 


Definition.   Let  T   denote   a   transaction  T  executing  as  part  of  a 
schedule  S.  Let  T„  be  the  same  transaction  T  executing  alone. 

To  compare  executions  of  T  and  T^  we  have  to  specify  the  state  E 
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of  the  database  when  T,  starts.   The  initial  state  E  that  T_   faces  is 
defined  as  follows: 

Let  V  be  the  tuple  of  the  values  of  the  variables  in  the  read  set 
of  T  when  they  were  referenced  for  the  first  time  by  a  READ  operation. 
(*)  Assign  these  values  to  the  state  E.  It  does  not  matter  what  values 
are  assigned  to  the  database  variables  not  in  the  readset  of  T. 


Definition.   A  transaction  T  is  independent  in  a  schedule  S  iff 

(1)  There  is  an  initial  state  E  for  T^  ,  defined  as  in  (*), 
satisfying  integrity  constraints. 

(2)  Tg  and  T^  for  the  state  E  of  (1),  read  the  same  data  values 
from  the  database  subsequently,  regardless  of  what  is  written  in  the 
database  by  other  transactions  in  the   schedule  S. 

Remarks. 

1.  Independence  is  a  semantic  notion, 

2.  We  have  to  deal  with  a  subtle  situation:  '-fhen  the  first  "read" 
operation  on  an  item  A  by  T  ,  is  preceded  by  a  "write"  operation  on 
the  same  item.  Here,  one  has  the  choice  to  select,  for  E,  the  value  of 
the  item  at  the  moment  of  the  first  "read"  or  the  value  written  by  the 
preceding  "write".  We  have  chosen  the  former  approach  because  of  our 
requirement  that  the  class  of  correct  schedules  is  a  subset  of  the 
class  NR.  I.e.  if  any  other  transaction  tries  to  write  into  item  A 
between  the  first  "write"  and  the  first  "read"  of  T^  on  A,  then  racing 
would  occur.  (The  whole  idea  of  Independence  is  to  protect  a 
transaction  from  intervening  writes  by  other  transactions,  when  it  is 
in  progress.   (This  is  Rule  2  of  the   definition   of   independence.)   A 
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certain  moment  when  the  protection  starts  must   be   chosen.   We  have 
chosen  It  to  be  the  time  of  the  first  "read"  of  each  item.) 
3.  Rule  2  says  that  T^  and  T_  read  the  same  data  for  the  second,  third, 
etc.   "reads"  of  an  item.   This  means  thta  the  following   situation   is 
allowed: 


T'  is  a  "harmless"  transaction  as  •  far  as  independence  of  T  is 
concerned.  Hence,  T'  is  allowed  to  interleave  with  T.  We  could  have 
stated  the  definition  of  independence  in  such  a  way  so  that  situations 
like  the  above  are  excluded  (the  rule  then  would  be  that  no  "writes"  on 
an  item  are  allowed  between  consecutive  reads  of  the  item). 

Definition.  A  schedule  S  is  independent  iff  each  transaction  T  of  S  is 
independent  in  S.  Let  I  be  the  class  of  Independent  schedules. 


Definition.   A  schedule  is  correct  iff 

(1)  it  is  independent 

(2)  it  preserves  ICs,  and 

(3)  it  has  no  racing. 

Let  C  be  the  class  of  correct  schedules. 

The  following  proposition  is  immediate. 
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Proposttlon.   Independent  schedules  cannot  produce  Inconsistent  outputs 
to  the  user. 

However,  the  Inverse  is  not  true.   Here  Is  a  counterexample: 


1      Tl 

T2             1 

1             rA 

1             rB 

rA  1 
rB             1 

A 

:=  A  -  A  1 
wA             ! 

1             rA 

B 

:=  B  +  A  1 
wB             1 

1       output 

B 

T,  produces  a  consistent  output  but  it  cannot  be  independent.  Here, 
this  happens  because  of  multiple  reads  allowed  to  be  in  sequence  in  a 
transaction.  Therefore,  our  class,  C,  of  correct  schedules  is  a  subset 
of  the  class  of  schedules  without  inconsistencies.  (If  we  were 
following  the  approach  of  defining  independence  in  terms  of  transaction 
external  writes,  we  would  have  captured,  in  the  class  of  correct 
schedules,  all  the  schedules  which  avoid  all  types  of  inconsistencies.) 

2.4  Relationships  among  P,  I,  NR  and  C 
Clearly,  C  =  P  n  I  n  NR.   We  now  prove 


Lemma. 

(a)  There  is  a  schedule  S,  in  I  -  P 

(b)  There  is  a  schedule  S2  in  P  -  I 

(c)  There  is  a  schedule  So  in  (I  n  p)  -  MR, 
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Proof : 


(a)  Here  is  S,  ,  Independent  but  not  IC  preserving: 


•1 


rA 

rB 


A  :=  A+A, 
B  :=  B-  a: 

wB 
wA 


A 

B 


rA 
rB 


;=  A  +  A. 
;=  B  -  a! 

wA 

wB 


The  ICs  are  A  +  B  =  constant,  and  A,  ^  ^2    • 

Let  Aq  ,  Bq  be  the  Initial  values  of  A,  B, 
We  have  (after  execution  of  S,) 

A  =  Aq+A2,    B  =  Bo-A2 


=>  A  +  B  =  Aq  +  Bq  +  A^  -  A2  j^  Aq  +  Bq   since  A  ]^  i^  A2, 


(b)  Here  is  schedule  Sj  which  is  IC  preserving  but  not  independent; 
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rA 

A  :=  A+A 
wA 

rB 

B  :=  B-A 


wB 


rA 


rB 


A 
B 


;=  A-A 

:=  B+A 

wA 


wB 


rA 
rB 


wB 
wA 


IC:  "A  +  B  =  constant"        -  ■' 
T2  is  not  independent  because  it  reads  inconsistent  inputs  from 
the  database, 
(c)  We  now  construct  schedule  S^  e  (I  n  P)-NR. 


1 


rA 
rB 


:=  A  +  A 
:=  B  -  A 


wA 
wB 


A  : 
B  : 


rA 
rB 


=  A  -  A 
=  B  +  A 


wA 
wB 


IC:  "A  +  B  =  constant" 

Tj^  ,   T2  are  independent.   The  final  state  is  A  =  Aq  -  A  ,  B  =  Bq  + 
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A  I.e.   A  +  B  =  Aq  +  Bq.   However,  there  Is  a  racing  condition  on  both 
A  and  B. 

We  show  (In  Fig.  2)  the  relations  of  various  classes. 


Figure  2 


2.5   Previuos  work  on  the  subject 
We   should   acknowledge  here   the   similar  work  of  Garcla-Molina, 
[Gars  83].    He   Introduced   the   notion  of   seraantlcally   consistent 
schedule,   which   is   similar   to   our   notion  of  correctness,  i.e.   a 
schedule  is  semantically  consistent  iff 

(1)  it  is  IC  preserving 

(2)  all  sensitive  transactions  obtain  a  consistent  view  of  the 
database.  (A  transaction  is  sensitive  if  it  produces  consitent  outputs 
to  the  users). 

Our  notion  of  an  independent  transaction   is   slightly  different. 
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Garcia-Mollna  also  defines  the  so  called  RS  objects  (Requiring 
Seriallzability)  for  which  serializable  access  is  required.  In  other 
words,  racing  is  not  allowed  on  these  objects.  We  introduced  a 
somewhat  weaker  requirement:  Serializable  behavior  of  any  two 
transactions  is  required  on  these  objects.  In  terms  of  [Gars  83]  we 
assumed  that  all  transactions  are  sensitive  and  all  objects  are  RS. 

From  this  point  on  our  approach  differs  substantially  from  [Gars 
83].  Garcia-Molina  developed  a  scheduler  that  produced  semantically 
consistent  transactions.  He  used  the  concepts  of  transaction  type 
(used  also  in  SSD-1 ,  see  [Bern  80])  and  compatibility. 

Our  approach  is  to  introduce  an  intermediate  class  of  schedules 
between  correct  and  serializable  with  easy  membership  test.  This  is 
the  subject  of  the  next  section. 

3.   The  distance  between  seriallzability  and  correctness. 
3.1   Linear  semantics  and  seriallzability 

Definition  [Kung  79],  [Casa  80].  A  schedule  is  serializable  if  it  is 
equivalent  to  some  serial  schedule  under  Herbrand  semantics.  Let  us 
call  SR  the  class  of  serializable  schedules. 

It  is  easy  to  see  that  srializable  schedules  are  correct  (i.e.  in 
C).  The  main  body  of  research  in  concurrency  control  was  conducted  for 
serializable  schedules.  It  would  be  nice  to  develop  concurrency 
control  algorithms  for  the  larger  class  of  correct  schedules  and  thus 
increase  the  degree  of  parallelism.  However,  it  is  difficult  to 
construct  membership  tests  for  schedules  in  C.  The  goal  of  this  section 
is  to  find  some  intermediate  class  of  schedules,  properly  contained  in 
C,  and  larger  than  serializable,  for  which  there  exists  a  relatively 
efficient  membership  test. 
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[Kung  79]  show  that  if  only  the  syntax  of  transactions  Is  known 
then  the  largest  class  of  schedules  that  guarantees  IC  preservation  is 
the  class  of  serializable  schedules.  Hence,  we  have  to  specify  some 
semantics  of  transaction  operations  and  ICs. 

We  assume  the  following  semantics.  . 

1.   Integrity'  constraints  are  a  system  of  n  linear  inequalities: 


™1 


I      c^jXj  <  0  ,  1  =  1,2, ...,n 


where  x,,...,x^_  are  the  values  of  the  items  ij^,...,i^  of  the  database 

and  {c.  ,}  are  constants.  '  "'"' 

(Linear  equations  can  be  written  as  a  pair  of  inequalities). 

2.   Operations  on  local  variables  of  a  transaction  are  linear  i.e.    of 
the  form 


"i 
X.  :=  E   d.jXj  , 


where   Xj,...,x^   are   values   of   local  variables  of  the  transaction, 
initialized  by  the  time  the  current  transaction  step  is  executed. 

Note  that  a  lot  of  practical  situations  (e.g.  bank  transactions, 
airline  reservation  systems  etc.)  can  be  covered  by  the  assumed  linear 
semantics.  Hence,  if  no  proper  superset  of  SR  can  be  found,  which  has 
an  efficient  membership  test,  for  the  above  semantics,  then  this  would 
imply  that  serlalizability  is  a  good  practical  approximation  to 
correctness. 

Definition.   Two   schedules  are  dependency  equivalent  if  they  have  the 
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same  dependency  graph,  when   only   syntactic   information   about 
transaction  actions  is  assumed. 

The  next  two  theorems  are  one  of  the  main  results  of  this  section. 
In  Theorem  1,  we  assume  the  action  model  for  transaction  operations 
i.e.  that  reads  and  writes  of  an  item  occur  in  pairs  only  (see 
[Ull,82]). 

Theorem  1.  For  any  linear  integrity  constraints  consisting  of 
just  one  equality  or  Inequality  and  for  any  nonserializable  schedule  S 
without  racing,  there  are  some  linear  semantics  for  the  actions  of  the 
transactions  such  that  the  integrity  constraints  are  violated. 

Proof:  Since  the  schedule  S  is  not  seriallzable ,  Its  dependency  graph  G 
has  a  cycle.  Let  It  be  Tj^  ->•  T2  *  ...  >  T^  ->■  T^  ,  n  >_  2.  Assume  that 
the  dependency  T.  >  "^±+1  occurs  because  of  an  Item  a^  existing  In  both 
transactions  (let  a^   =  a  and  ai  =  a^^^i). 

Without  loss  of  generality  we  assume  that  our  IC  is  of  the  form 


I      cjaj<0 
j=l 


where  c-  ^  0,  V  j,  and  a^  ,  j  =  l,...,n 

are  the  items  defined  above. 

Since  reads  and  writes  occur  only  in  pairs,  it  means  that   T^^   has 
two  pairs 

T^  :  r  a^  T^  :  r  a^.^ 

and 
T.  :  w  a^^  T^  :  w  a^,^ 

Note  that  there  may  be  several  references  to  a.  In  T.  and  to  a^^  in 
"^1+1  •   Select  one  pair   raj^/wa^   in  each   transaction   so   that   both 
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ra./wa^  of  Tj^  occur  before  raj^/wa^  of  T^^^^  .  (This  Is  always  possible 
because  of  the  dependency  Tj  ■*■  Tj^i  and  because  racing  is  excluded). 
Assign  the  trivial  semantics  "do  nothing"  (i.e.  ra^^  ,  a^  :=  a^^  ;  wa^) 
to  all  other  read/write  pairs  in  the  schedule  (which  are  not  involved 
in  the  cycle).  The  semantics  of  the  selected  pairs  ar  defined  as 
follows:  For  i  =  l,2,...,n-l  assign: 


a.    :=  a^ 


^i-1  ==  ^i-1 


A 
c 


i 


A 


^i-1 


(for  some  A  ?^  0,  to  be  chosen  appropriately). 

(Notice  that  it  does  not  matter  how  the  four  actions  ra,  ,  wa^^  , 
ra.^,  »  ^^i+1  ^^^  interleaved  in  Tj  for  this  assignment  to  produce  a 
constraint  preserving  T^^).  If  the  four  actions  of  Tj^  are  In  one  of  the 
following  orders 


■^n=  "n-1       ^n-    "n-1       '^n'-    ''^n 

Tn=  ^^n  °'   ^n'-    ^^n  °'   \--   «^n-l 

T^:  wa^         T^:  wa^.^       T^:  wa^ 


we  can  assign 


(*) 


a 


_  1      _   A 
n-1  =   2  ^n-1   T~ 


n-1 


1  ^n-l^n-1  ^    ^  A 
a„    := +  a  +  — 


n 


'^n 


If,  however,  the  four  actions  are  in  the  order: 
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'^n  =  "n 


^n  '■   w^n 


Tn  =  ^^n-1 


wa 


n-1 


then  a  and  a^-i  should  be  interchanged  in  (*).  Without  loss  of 
generality  assume  the  semantics  (*)  for  T  .  Since  we  have  defined  the 
semantics  of  all  actions  of  all  transactions,  we  can  now  present  the 
semantic  content  of  the  subschedule  of  actions  with  nontrivial 
operations: 


Transaction 


Action 


^1  ==  ^1  + 


a,  :=  a,  -  — 
^     ^   c, 


3.ry         t  ^   3,rt 


n   a^_^  :=  a^.^. 


i-l 


L  a^  :=  a^  +  i- 


n 


'n-1 


^n  '•- 


:=  -s-  a 


n-1 


-n-1 


1  ^^n-l^n-l         a 

+  3   +  


^n  = 


=  a„  - 


-22- 
Note   that  each  transaction  (including  T  )  preserves  ICs.   However,  the 
schedule  does  not  preserve  ICs. 

To  see  that,  note  that  the  new  values  of  a^  ,  a.2    ,    ...,   ^n-2  ^^^ 
the  same  as  the  old  ones;  also 

new    1    „Tj     A 


^»-' '  T  <*?  *  Sir'  -  vT 


a 


new 
n 


old 


2      <=n    ^    "     ^n   ^n 


hence 


c„  ,a!}^Y  +  c,aj}^^ 
n-1  n-i    n  n 


1  ,      ,0ld   ,   1  ^        ^    _  A  +  ^  r     ;,0^'^  +  C     ^O^^i 


-n-1 


Cn  ia°^f  +  c,a°^<^  -  Ia 
n—  1  n—  1    n  n     2 


By   selecting   Chen  a  suitable  negative  A  we  can  have  i      c^a^  >  0  after 

j  =  l   -J  -J 

the  schedule's  execution,  thus  violating  the  IC.  • 

We  conjecture  that  Theorem  1  remains  true  even  if  the  ICs  consist 
of  many  equations  and  inequalities.  For  this  (more  general)  case,  we 
managed  to  prove  the  following  weaker  version  of  Theorem  1. 

Theorem  2.  If  the  number  of  transactions  in  any  transaction 
system  is  less  than  /2m,  where  m  is  the  number  of  database  items,  then 
for  any  linear  IC  and  for  any  nonserializable  schedule  S  with  more  than 
two   transactions,  there  exists  a  dependency  equivalent  schedule  S'  and 
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appropriate  linear  semantics  for  S''s  actions,  such  that  S'   violates 
the  ICs. 

Proof  Sketch:  First,  we  select  one  inequality  from  the  ICs.  Let  it  be 
CjXj  <_  0.  Then,  we  follow  the  proof  of  Theorem  1,  by_  adding  new 
variables  (and  new  steps  to  transactions)  ^p-'^.^ri-Z  3"^  selecting  the 
a^_^  and  a^^  from  the  Xj_'s  (by  adding  new  steps  to  T^-i>T^,  Tj^  if 
necessary).  The  resculting  schedule  S'  will  have  the  same  dependency 
graph,  G,  with  S.  The  total  number  of  variables  required  is   less   than 

the   number   of  edges  of  a  complete  graph  with  G's  vertices,  i.e.   less 

^,    k(k-l)  .  /2^(/2i^-l)  .  - 

than  <  <  m.  • 

2         2 

We  assume  that  the  reader  is  familiar  with  the  notion  of   optimal 
schedulers.  Introduced  in  [Kung  79]. 

Corollary  1.  The  seriallzaBle  scheduler  is  the  optimal  one  for  the 
following  information: 

(1)  IC  are  linear  inequalities  and  they  are  given 

(2)  The  semantics  of  operations  are  linear 

(3)  Syntax  is  given. 

3.2   Weak  Seriallzablllty 

Definition.  Assume  transaction  system  x  consists  of  n  transactions.  \ 
schedule,  S,  is  weakly  serializable  of  order  k'  (k  <  n)  if  any 
subschedule  S'  of  S,  consisting  of  exactly  k   transactions   of  T ,   is 


T  Do   not   confuse   this   weak   serializability   notion  with  the  weak 
serializability  defined  by  other  authors  e.g.   In  [Casa  80],  [Papa  79]. 
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serialtzable.   (The  class  of  the  weakly  serlallzable  schedules  of  order 
k  is  represented  as  WSR.  ). 

The  most   Interesting   cases   are   those   of  k  =  2  and  k  =  n-1. 

[Schl  78]  calls  our  weak  serializability  of  order  2  "integrity   in   the 

weak  sense".  We  shall  call  almost  serlallzable  the  weakly  serlallzable 
schedules  of  order  n-1. 

Corollary  2.  For  any  linear  IC  and  any  transaction  system  t  of  n  >  2 
transactions,  and  any  almost  serlallzable  but  not  serlallzable  schedule 
S  of  T  ,  there  are  linear  semantics  of  the  transaction  actions,  such 
that  S  does  not  preserve  the  IC. 

Corollary  3.  For  any  linear  IC  and  any  transaction  system  t  of  n  >  2 
transactions  and  any  weakly  serlallzable  schedule  S  of  order  2  (of  t), 
which  is  not  serlallzable  there  are  suitable  linear  semantics  of  the 
transaction  actions,  such  that  S  does  not  preserve  the  IC. 
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Example  (to  Corollary  3). 


Schedule  S 

rX 

X  :=  X  +  A 

rX 

X  :=  X  -  A 

wX 

rY 

Y  :=  Y  +  A 

wY  ■■       ,  ■_-.-^ 

rY 

rZ 

Y  :=  Y  *  (a+1)  - 

Z(b-l) 

Z  :=  ^Y  +  6Z 

wZ 

wY 

rZ 

Z  :=  Z  +  A 

wZ 

The  IC  is  "X  +  Y  +  Z  =  0".   Let  us  asume  X  =  X^,  ,  Y   =  Yq  ,   Z 


Zg  ,  just  before  S.  After  execution  of  S, 


X  +  Y  +  Z=Xq  +  Yq  +  Zo-  a-^YQ  -  (a-a6  )Zq  +  A 


i.e.,  X  +  Y  +  Z5'Xq  +  Yo  +  Zq  =  0  (for  suitable  a,B,A). 
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3.3  Properties  of  weak  serlalizabllity 
Theorem  1  says  that  there  are  many  nonserializable  schedules  that 
are  not  correct  (for  a  very  practical  meaning  of  "correctness"). 
However,  the  question  about  an  Intermediate  class  of  schedules  (between 
C  and  SR) ,  with  an  easy  membership  test.  Is  still  pretty  much  open. 
One  way  of  continuing  the  search  is  to  restrict  further  the  semantics 
of  transaction  operations.   The  following  result  is  straightforward: 

Lemma.  If  the  semantics  of  transaction  operations  are  of  the  form  A  := 
A  +  A  where  A  does  not  depend  on  data  item  values,  then  any  schedule  Is 
equivalent  to  a  serial  schedule  (in  terms  of  the  final  database  state). 

Proof  sketch:  The  effect  of  any  schedule  is  the  same:  It  adds  to  each 
item  a  (signed)  sum  of  A's.   The  order  of  the  addition  does  not  matter. 

e 

We  now  show  that,  if  IC  are  preserved,  then  weak  seriallzabillty 
of  order  2  implies  independence.  Since  WSR^  eliminates  racing,  we 
conclude  that  for  some  (strong)  semantics,  WSR^  C  C. 

Theorem  3.  If  S  S  WSR,  and  S  preserves  the  integrity  constraints,  then 
S  €  I.  I.e.  if  the  semantics  are  such  that  S  S  WSR2  =>  S  S  P,  then 
S  e  I. 

Proof:  Consider  a  particular  transaction  T  In  a  schedule  S  e  WSR, .  We 
shall  prove  that  T  is  independent  of  the  rest  of  the  schedule,  by 
proving  (1)  that  items  in  the  readset  of  T,  encountered  for   the   first 
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time,  satisfy  IC  and  (2)  T  reads  the  same  values  if  running  alone  and 
if  in  the  schedule  S  (of  the  transaction  system  x). 

To  prove  the  first  condition,  let  us  denote  by  H^  the  set  of  items 
which  belong  to  both  the  readset  of  T  and  writeset  of  T^  ,  T^  S  t  ,  T^^  5^ 
T.  Since  T^  and  T  are  serializable  in  S,  all  writes  of  items  in  H^  by 
T^   must  be  done  either  before  or  after  T  (in  S).   Let 

T  '  =  { T .  I  writes  of  items  in  H^^  by  Tj  happen  before  T} 

It  is  easy  to  see  that  writes  of  items  in  H-  by  transactions  in  t^t' 
are  done  after  T.  Hence,  the  values  read  by  T  are  the  outputs  of  the 
subschedule  of  S  consisting  of  transactions  in  x'.  We  also  have  to 
choose  the  values  of  the  items  not  in.  the  read-set  of  T  (to  provide  an 
"initial  state"  for  T^  ,  see  our  definition  of  independence).  We 
obviously  choose  them  to  be  the  final  values  of  the  specified  items, 
when  all  transactions  finish  in  x'.  Hence,  the  initial  state,  E,  of 
the  database  for  T  (running  alone)  is  the  state  of  the  database  when 
transactions  in  x  '  finish  their  executions. 

Let  S'  be  the  subschedule  of  S,  consisting  of  only  the 
transactions  in  x'.   Since  x'  C  x  we  have  S'  •=  WSR-,. 

Claim.   S'  preserves  IC. 

Proof:  Transaction  actions  of  S'  have  the  same  semantics  as  those  of  S, 
and  S  e  Pn  WSR2.  Also  S'  C  S.  Hence  S'  e  P.  • 

By  the  above  claim,  E  is  a  consistent  state.  This  proves 
condition  (1). 
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To  prove  condition  (2),   note   first   that   T„   and  T.   can   read 
different  values,  only  in  two  cases: 

(1)  there  are  2  reads  of  the  same  item  in  T  and  some  other  transaction 
T,  writes  the  same  item  in  between; 

(2)  the  first  reference  to  some  item  A  in  T  is  a  "write  A"  and  a 
subsequent  "read  A"  of  T  was  preceded  by  a  "write  A"  of  some  other 
transaction  Tj^  (see  Figure  4). 

Case  1  Case  2 

T       "^1  "•  •■     T        '^1 


rA  wA 

wA  wA 

rA  rA 

Figure  4 

But,  in  both  cases,  the  subschedule  consisting  of  just  T  and  Ti  is 
not  serializable.  This  contradicts  the  fact  that  S  e  WSR2  ,  hence 
these  cases  cannot  happen.  This  proves  the  second  condition.  .  Note 
that  our  proof  holds  for  any  T  e  x .  • 

Corollary  4.    If   the   semantics   are  such  that  S  e  WSR2  =>  S  e  P  then 
also  S  e  C.  \ 

Proof:   By  Theorem  2,   S  el.    But  WSR2  C  NR   =>   S  6  MR.    Hence 
se(inNRnp)  =  c.  • 
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;}ote;  One  can  ask  the  following  question:  If   the   semantics  are   such 
that  WSR2  C  P,   ts  the   Inclusion  WSR2  C  C  proper?  The  answer  Is  yes: 
Here  Is  an  example  of  a  schedule  In  C  -  WSR2. 


Ti       T2 


rA 

rA 

wA 
rB 

rB 

wB 

The  IC  are   the  empty  set.   This  schedule   Is  correct  but   not 
sertallzable  (hence,  not  In  WSR2). 

Now,  we  state  another  interesting  property  of  '.-JSR2: 

Definition:   Schedules   Si  (of      transaction   system  t^  and  S2   (of 

transaction  system  T2)  are  called  IC-equivalent   iff   for  any  initial 

consistent   state  of  the  database  and  for  any  inputs  of  x,  and  T2  ,  the 
following  holds: 

If  Inputs  of  the  same  transactions  in  t.  and  t ->  are  equal  then 
(*)  The  final  state  of  the  database  after  execution  of  S^  is  consistent 
iff   the   final   state   of   the   database   after   execution  of   S2   is 
consistent. 

Theorem  4.   If  S  €.  WSR,  then  for  any  transaction  Tr,  of  S   there   exists 
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an  IC-equivalent  schedule  S'  (of  the  same  transaction  system  with  S)  In 
which  all  steps  of  Tq  are  grouped  together. 

Proof:  See  Appendix  I. 
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Appendix  I 

Proof  of  Theorem  4 

We  need  the  following  definitions  and  notation. 

Let  WS^  be  the  writeset  of  transaction  T^^. 

Let  RS^  be  the  readset  of  T^^. 

We  do_  not  require  WS^  c  RS^. 

Let  H^.  be  equal  to  (WS^  n  wSj)  U  (WS^  n  rs^)  U  (rs^  n  wSj)  (for  i 
^   j).   H^  .  is  called  the  set  of  conflicting  items  of  1^   and  T^. 

The  proof  of  Theorem  4  will  be  done  by  induction  on  the  number  of 
steps  in  Tq  that  we  can  group  together.  If  Tq  consists  of  just  one 
step,  then  there  is  nothing  to  prove  (i.e.  S'  =  S).  Assume,  for  the 
sake  of  induction,  that  we  can  group  n-1  steps  of  Tq  together  and  get  a 
schedule  Sj^  which  is  IC-equivalent  to  S.  We  shall  show  how  to  put 
together  n  steps  of  Tq  .  Without  loss  of  generality,  assume  the  n^ 
step  of  Tq  is  above  the  grouped  steps,  in  Si.  Let  this  step  be  of  the 
form  "act  on  item  A"  (we  will  not  distinguish  between  reads  and  writes 
in  this  proof,  for  simplicity).  This  step  can  be  interchanged  with  any 
step  of  any  other  transaction  not  referencing  A,  and  the  schedule 
remains  IC-equivalent  to  S.  Let  us  move  this  n^  step  of  Tq  down 
towards  the  group  n-1  steps  of  Tq  .  We  continue  doing  this  until  a 
step  of  some  other  transaction  T.  (also  referencing  A)  is  encountered 
and  it  conflicts  with  the  n^  step  of  Tq  .  The  resulting  schedule  is 
IC-equivalent  to  S  and  looks  like 


Tq:  act  on  A 
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1^:    act  on  A 


Tg:  act  on  Hq^ 


(where  "act  on  Hq^"  is  a  group  of  steps  of  Tq  with  Items  from  Hq^  ,  not 

including  A).   Since  Tq  and  T^  are  serializable,  the  transaction  steps 

"Ti:  act  on  H^q"  cannot  appear  before  "Tq:  act  on  Hq^^".   So,   the  only 
possibiity  is                  .-'  ..  •" 


Tq:  act  on  A 
T^:    act  on  A 


Tq:  act  on  Hq^ 


T^:  act  on  Hq^ 


In  this  case,  both  "Tq:  act  on  A"  and  "T^ :  act  on  A"  can  be  moved  down 
and  still  keep  the  resulting  schedule  IC-equivalent  to  S,. 

If  transaction  steps  of  other  transactions  referencing  A  and 
conficting  with  "Tq:  act  on  A"  or  "T^:  at  on  A"  are  encountered  during 
this  process,  we  push  thera  down  also. 

Notice  that  Tq  cannot  have  any  step  conflicting  with  "T^:      act   on 
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A"  below  It.   Otherwise,  the  schedule  would  contain  the  following  three 
steps  in  order: 

Tq:  act  on  A 

T^:  act  on  A 

Tq:  act  on  A 

and  Tq  and  Tj    would  be  nonserializable. 

Hence,  transaction  steps  of  other  transactions  conflicting  with 
"Tq:  act  on  A"  can  be  pushed  down  through  the  group  of  steps  of  Tq  and 
the  resulting  schedule  will  remain  IC-equivalent  to  S,  (and,  thus,  to 
S). 

We  conclude  that,  in  any  case,  we  can  group  the  n^^  step  of  Tq 
together  with  the  group  of  the  n-1  steps  and  get  a  schedule  S2  , 
IC-equivalent  to  S^  .  Since  TC-equivalence  is  transitive,  $2  is 
IC-equivalent  to  S.  This  completes  the  proof  of  the  induction  step.   • 


C.( 
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